July 23, 2018

Thanks to my co-authors:

  • Tarek Haddad, Medtronic
  • Xuefeng Li, CDRH/FDA
  • Ram Tiwari, CDRH/FDA
  • Rajesh Nair, CDRH/FDA
  • Jianxiong Chu, CDRH/FDA

Outline

  • Review of Conditional Power Prior and Discount Prior Method
  • Comments on Discount Prior Method from a Regulatory Perspective
  • Potential variations for Estimating Power Parameter
    • Using External Data in place of Current Data
    • Using Interim Data in place of entire Current Data
    • Modified Measure of Similarity of Current & Prior Data
  • Simulation Results

Review of Conditional Power Prior

Prior Distribution given \(\scriptsize{D_0}\)

\[\color{black}{\large{ {\pi ^{CPP}}(\theta |{D_0},{\alpha _0}) \propto {L_0}{(\theta |{D_0})^{{\alpha _0}}}\pi (\theta )}} \]

Posterior Distribution given \(\scriptsize{D}\)

\[\color{black}{\large{\pi (\theta |D,{{D}_{0}},{{\alpha }_{0}})\propto {{L}_{0}}{{(\theta |{{D}_{0}})}^{{{\alpha }_{0}}}}\pi (\theta )L(\theta |D)}}\]

Prior with two prior datasets \(\scriptsize{D_0}\) and \(\scriptsize{D_1}\)

\[\color{black}{\large{\begin{align} & {{\pi }^{CPP}}(\theta |{{D}_{0}},{{D}_{1}},{{\alpha }_{0}},{{\alpha }_{1}})\propto \\ & \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{{L}_{0}}{{(\theta |{{D}_{0}})}^{{{\alpha }_{0}}}}\pi (\theta ){{L}_{1}}{{(\theta |{{D}_{1}})}^{{{\alpha }_{1}}}} \\ \end{align}}}\]

Discount Prior Method (Haddad et al., 2017)

Prior given \(\scriptsize{D_0}\) (and \(\scriptsize{D}\))

\[\color{black}{\large{\pi ^{CPP}(\theta |{D_0},D) \propto L(\theta |{D_0})^{\color{red}{\alpha_{0}(D_0,\,D)}\pi (\theta )}}}\]

\[\large{\alpha_{0}(D_0,\,D) = 1 -\exp(-(\frac{p}{\lambda})^k) = F(p|\lambda, k)}\]

where


\(\large{p = P({\theta }<{{\theta }_{0}}|\,{{D}_{0}},\,{D})}\)

\(\,\,\, \large{\theta}\) ~ \(\large{\pi(\theta | D) \propto L(\theta | D)\pi(\theta)}\)

\(\large{\theta_0}\) ~ \(\large{\pi(\theta | D_0) \propto L(\theta | D_0)\pi(\theta)}\)

Comments on Discount Prior Method

  1. The current data are used twice, both in the prior and in the likelihood to obtain the posterior. This violates the Likelihood Principle.

  2. Effective prior sample size can change after seeing the current data. It is not legitimate Bayesian because the prior is fully determined a posteriori.

  3. A "stochastic ordering" measure of similarity may be influenced by the current data sample size so that more prior information is borrowed when the sample size is smaller.

Potential Variations to Discount Prior Methodology

Determining the power parameter from an external existing dataset \(\scriptsize{D_1}\)

Eliminates double use, with \(\scriptsize{\alpha_0}\) determined a priori

Conditional Power Prior becomes:

\[\color{black}{\large{{{\pi }^{CPP}}(\theta |{{D}_{0}},{{D}_{1}})\propto {{L}_{0}}{{(\theta |{{D}_{0}})}^{{{\alpha }_{0}}( {{D_0}},\, \color{red}{{D}_{1}} )}}\pi (\theta )} }\]

\[\color{black}{\,\,\,\,\,\,\,\,\,\,\large{0\le {{\alpha }_{0}}({{D}_{0}},{{D}_{1}})\le 1}}\]

\(\scriptsize{{{\alpha }_{0}}({{D}_{0}},{{D}_{1}})}\) is the proportion of down-weighting of the LH of \(\scriptsize{D_0}\) using information from both \(\scriptsize{D_0}\) and \(\scriptsize{D_1}\).

Determining the power parameter from interim current data (\(\scriptsize{D_1}\))

\(\scriptsize{{{\alpha }_{0}}({{D}_{0}},{{D}_{1}})}\) may not reflect a comparison between prior data (\(\scriptsize{D_0}\)) and current data, and \(\scriptsize{D_1}\) may not exist.

An alternative is to use interim information from the current data.

Suppose \(\scriptsize{D_1}\) is an initial subset of \(\scriptsize{D}\) of size \(\scriptsize{n_1}\) = %\(\scriptsize{n}\).
Let \(\scriptsize{D=\left[{{D}_{1}},{{D}_{2}}\right]}\), then the likelihood of \(\scriptsize{D}\) is

\[\small{L(\theta |D)=L(\theta |{{D}_{1}})\times L(\theta |{{D}_{2}})}\]

Mathematically, we can consider \(\scriptsize{D_1}\) as a "second" prior dataset because it comes before \(\scriptsize{D_2}\) and after \(\scriptsize{D_0}\).

Determining the power parameter from interim current data (\(\scriptsize{D_1}\))

The resulting prior conditions on \(\scriptsize{D_0}\) and \(\scriptsize{D_1}\), but only down-weights the LH for \(\scriptsize{D_0}\).

\[\color{black}{\large{\begin{align} & {{\pi }^{CPP}}(\theta |{{D}_{0}},{{D}_{1}})\propto {{L}_{0}}{{(\theta |{{D}_{0}})}^{{{\alpha }_{0}}({{D}_{0}},\color{red}{{D}_{1}})}}\pi (\theta ) \color{red}{L(\theta |{{D}_{1}})} \\ \end{align}}}\]

Posterior given current data \(\scriptsize{D_2}\):

\[\color{black}{\large{\begin{align} & \pi (\theta |{{D}_{0}},{{D}_{1}},{{D}_{2}})\propto \\ & \,\,\,\,\,\,\,\,\,\,\,\,\, L{{(\theta |{{D}_{0}})}^{{{\alpha }_{0}}({{D}_{0}},{{D}_{1}})}}\pi (\theta )L(\theta |{{D}_{1}}) \color{red}{L(\theta |{{D}_{2}})} \\ \end{align}}}\]  

Yesterday's posterior is today's prior

Determining the power parameter from interim current data (\(\scriptsize{D_1}\))

Things to note:

  • \(\scriptsize{{{\alpha }_{0}}({{D}_{0}},{{D}_{1}})}\) is not a function of the entire current data \(\scriptsize{D}\). The most recent data \(\scriptsize{{{D}_{2}}}\) is not included in the prior.

  • Modified CPP is a legitimate prior on \(\scriptsize{\theta}\).

  • We do use the subset \(\scriptsize{{{D}_{1}}}\) twice for making inference on \(\scriptsize{\theta}\)

  • If \(\scriptsize{{{D}_{1}}}\) were small enough to be considered not essential for inference, we could eliminate \(\scriptsize{L(\theta |{{D}_{1}})}\) from the posterior. However, we never want to discard any current data.

Measuring Similarity of Prior and Current Data (Stochastic Ordering measure)

Hypothetical Posterior Distribution of \(\scriptsize{\theta}\) and \(\scriptsize{{{\theta }_{0}}}\) for Success Rate
Current data better than prior data
Shaded is \(\color{black}{\scriptsize{P(\theta >{{\theta }_{0}}|{{D}_{0}},D)}}\)

Measuring Similarity of Prior and Current Data (Stochastic Ordering measure)

Hypothetical Posterior Distribution of \(\scriptsize{\theta}\) and \(\scriptsize{{{\theta }_{0}}}\) for Success Rate
Current data better than prior data
Shaded is \(\color{black}{\scriptsize{P(\theta >{{\theta }_{0}}|{{D}_{0}},D)}}\)
Borrow more with
higher current data
sample size

Hypothetical Posterior Distribution of \(\scriptsize{\theta}\) and \(\scriptsize{{{\theta }_{0}}}\) for Success Rate
Current data worse than prior data
Shaded is \(\scriptsize{P(\theta >{{\theta }_{0}}|{{D}_{0}},D)}\)

Hypothetical Posterior Distribution of \(\scriptsize{\theta}\) and \(\scriptsize{{{\theta }_{0}}}\) for Success Rate
Current data worse than prior data
Shaded is \(\scriptsize{P(\theta >{{\theta }_{0}}|{{D}_{0}},D)}\)
Borrow more with
lower current data
sample size
(anti-conservative?)

Alternative Measure of (Dis)similarity:
Kolmogorov-Smirnoff Statistic (KS)

\(\color{black}{\small{D_{n,m} = \sup\limits_{x}|D_{0,n}(x) - D_{1,m}(x)| = KS}}\)

Simulation Study

Prior data: \(\scriptsize{D_0 \text{~} N(0,1)}\) with sample size \(\scriptsize{n_0}\) = 100 (or 50).

Current data: Normal mean values, \(\scriptsize{{{\theta }^{*}}}\): \(\scriptsize{(– 1, – 0.75, – 0.5, }\) \(\scriptsize{– 0.25, – 0.1, 0)}\).

For each \(\scriptsize{{{\theta }^{*}}},\) we simulated 15,000 current data sets D from a \(\scriptsize{N({{\theta }^{*}}, 1)}\) distribution with size \(\scriptsize{n}\) = 100 (or 50).

The percent of \(\scriptsize{D}\) used to form \(\scriptsize{D_1}\): \(\scriptsize{100\%, 75\%, 50\%, 25\%}\).

.

Similiarity measure: KS statistic or stochastic ordering measure.

Simulation Study

Operating Characteristics: Type I error rate is computed under \(\scriptsize{H_0: \theta \le l}\), where \(\scriptsize{l \in (-1,..., -0.25, -0.10)}\).

\(\scriptsize{H_0}\) is rejected if posterior probability of \(\scriptsize{H_0}\) is less than 0.025.

Power is computed under \(\scriptsize{H_0:\theta \le -0.5}\) with values of \(\scriptsize{{{\theta }^{*}} \in (-0.35, -0.25, -0.10, 0)}\).

The maximum value of \(\scriptsize{\alpha_0}\) allowed is 1.0.

Form of \(\scriptsize{{{\alpha }_{0}}({{D}_{0}},{{D}_{1}})}\) for Simulations

\(\scriptsize{ 1 - exp(-(\frac{1-KS}{12})^{0.9})}\)
\(\scriptsize{1 - exp(-(\frac{p}{3})^{0.65})}\)
Weibull parameters should be pre-specified at the design stage.

\(\scriptsize{\color{black}{\alpha _0(D_0,D_1)}}\) by \(\scriptsize{{{\theta }^{*}}}\) (prior mean = 0 and \(\scriptsize{\color{black}{n_0 = n} = 100}\))

Type I Error Rate by \(\color{black}{\scriptsize{{{\theta }^{*}}}}\) (null value) when \(\color{black}{\scriptsize{n = n_0 = 100}}\)

Type I Error Rate by \(\color{black}{\scriptsize{{{\theta }^{*}}}}\) (null value) when \(\scriptsize{\color{black}{n = n_0 = 100}}\) and max \(\scriptsize{\alpha_0(D_0, D_1) = 0.5}\) using KS

Power by \(\scriptsize{{{\theta }^{*}}}\) when \(\scriptsize{\color{black}{n = n_0 = 100}}\)

Posterior SD of \(\scriptsize{\color{black}{\theta }}\) using \(\scriptsize{\color{black}{D_1}}\) Twice or Once (KS measure)

Summary of variations on discount prior method

  • We have suggested a variation for the discount prior method so that there is less “double use” of the full set of current data in the final posterior.

    • We partition D into two sets, using an initial subset, \(\scriptsize{D_1}\) to estimate \(\scriptsize{\alpha_0(D_0,D_1)}\).

    • If \(\scriptsize{D_1}\) is considered a second prior dataset, then inference on \(\scriptsize{\theta}\) (after observing \(\scriptsize{D_2}\)) is legitimately Bayesian.

    • We have not demonstrated how to decide what percentage of \(\scriptsize{D}\) should be used in \(\scriptsize{D_1}\).

  • We suggest alternative similarity measures to the “stochastic ordering” measure that may depend less on posterior variance, if needed.

Summary of results of the modified methods

  • Using the KS similarity measure (w/ Weibull disc function):
    • Type I error rate was generally lower when an interim data percentage was used to estimate the borrowing fraction, as opposed to using all of the current data at the end of the study.

    • Power was higher when using 100% of the current data (vs less than 100%) to estimate the amount of borrowing as well as to make an inference at the end of the study.

  • Using stochastic ordering measure (with different Weibull disc function) showed different patterns (lower type I error rates, but lower power).

  • Discarding \(\scriptsize{D_1}\) decreases efficiency (increases posterior variance), eliminates useful information, and can increase bias.

  • Simulations showed scant deflation of posterior variance after representing the information from \(\scriptsize{D_1}\) twice in the posterior distribution of \(\scriptsize{\theta}\).

Backup Slides

Limitations

  • The power prior method assumes patient-level exchangeability, which may be too strong. Covariates may be needed to calibrate the prior and current datasets even when the power parameter is less than 1.0.

  • One shouldn’t base the amount of borrowing solely on observed outcome responses because outcome responses may be influenced too much by sampling variability.

  • The KS statistic would not differentiate between similarity where the current data are slightly better than the prior and similarity where the current data are slightly worse.

    • Stochastic ordering measure would differentiate these two situations
  • Ultimately, investigating design characteristics is necessary to ensure a reasonable trial design.

Type I Error Rate by \(\scriptsize{{{\theta }^{*}}}\) (null value) when \(\scriptsize{\bf{n = n_0 = 50}}\)

Samples from the Posterior of \(\scriptsize{\bf{\theta}}\) when \(\scriptsize{\bf{D_1} = 75\%}\) of D is Kept (top) and Discarded (bottom)
\(\scriptsize{H_0}\)
\(\scriptsize{H_1}\)
\(\tiny{\bf{(\bar{\theta} = -0.04)}}\)
\(\tiny{\bf{(\bar{\theta} = +0.05)}}\)

Other Measures of Similarity in the Literature

1. Gravestock & Held (2017) maximize the marg. LH of \(\scriptsize{\alpha_0}\)

\[\small{\hat{\alpha}_0^{ML}= \underset{\alpha_0}{\mathrm{arg\,max}}=L(\alpha_0 | D_0, D) = \int L(\theta|D)\pi(\theta,\alpha_0|D_0) d\theta} \]

2. Nikolakopoulos et al. (2017) use the prior-predictive p-value:

\[\small{ppp(D_0, \hat{\alpha}_0) = 2 \min(P_{D|D_0, \hat{\alpha}_0}(T(D) \ge T(D^{obs})), \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, P_{D|D_0, \hat{\alpha}_0}(T(D) \le T(D^{obs})))}\]

where \(\scriptsize{\hat{\alpha}_0 = \min[\max(\hat{\alpha}_0: ppp(D_0, \hat{\alpha}_0) \ge c), 1] }\).

  • Uses current data in the estimation of \(\scriptsize{\hat{\alpha}_0}\). Could be determined with interim current data (though not mentioned in paper).

Posterior SD of \(\scriptsize{\bf{{{\theta }}}}\) Keeping or Discarding \(\scriptsize{\color{black}{D_1}}\)

Bias of Posterior Mean of \(\scriptsize{\bf{{{\theta }}}}\) Keeping or Discarding \(\scriptsize{\bf{D_1}}\)

\(\scriptsize{\bf{\alpha _0(D_0,D_1)}}\) by \(\scriptsize{{{\theta }^{*}}}\) (prior mean = 0 and \(\scriptsize{\bf{n_0 = n}}\) = 50)

\(\scriptsize{\bf{\alpha _0(D_0,D_1)}}\) by \(\scriptsize{{{\theta }^{*}}}\) (prior mean = 0 and \(\scriptsize{\bf{n_0 = n}}\) = 25)

Type I Error Rate by \(\scriptsize{{{\theta }^{*}}}\) (null value) when \(\bf{\scriptsize{n = n_0 = 25}}\)

Power by \(\scriptsize{\bf{{{\theta }^{*}}}}\) when \(\scriptsize{\bf{n = n_0 = 50}}\)

Results from Nikolakopoulos et al.

\(\scriptsize{\bf{\alpha _0(D_0,D_1)}}\) by \(\scriptsize{{{\theta }_0}}\)
(\(\scriptsize{{{\theta }}}\) = 0 and \(\scriptsize{\bf{n_0 = n}}\) = 100)

Type I Error Rate by \(\scriptsize{{{\theta }_0}}\)
(\(\scriptsize{{{\theta }}}\) = 0 and \(\scriptsize{\bf{n_0 = n}}\) = 100)

Power by \(\scriptsize{{{\theta }^{*}}}\) for \(\scriptsize{{{H_0:\theta=0}}}\)
(\(\scriptsize{{{\theta_0}}}\) = 0.25 and \(\scriptsize{\bf{n_0 = n}}\) = 100)